Exact sampling from non-attractive distributions using summary states 
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Propp and Wilson's method of coupling from the past allows one to efficiently generate exact 
samples from attractive statistical distributions (e.g., the ferromagnetic Ising model). This method 
may be generalized to non-attractive distributions by the use of summary states, as first described by 
Huber. Using this method, we present exact samples from a frustrated antiferromagnetic triangular 
Ising model and the antiferromagnetic q = 3 Potts model. We discuss the advantages and limitations 
of the method of summary states for practical sampling, paying particular attention to the slowing 
down of the algorithm at low temperature. In particular, we show that such a slowing down can 
occur in the absence of a physical phase transition. 



I. INTRODUCTION 

In many statistical problems, physical and otherwise, it 
is useful to be able draw samples from a complex distribu- 
tion. For example, in statistical physics one is interested 
in the Boltzmann distribution 



(1) 



where E{a) describes the energy of a system in configu- 
ration er, (3 is the inverse temperature (we set ks = 1). 
and Z is a normalizing constant (the partition function) . 
In general, E{a) may be easy to evaluate for a particular 
configuration, but the number of possible configurations 
makes it impractical to draw directly from the distribu- 
tion. Yet some efficient method of sampling is desirable, 
as this would allow one to calculate properties of the 
system that might not be easily computed by analytical 
means. 

In traditional Monte Carlo sampling methods, such 
as the Metropolis-Hastings method and Gibbs sam- 
pling Q (also known as the heat bath algorithm), one 
constructs an ergodic Markov chain whose stationary dis- 
tribution is the desired distribution. By starting in some 
state and evolving the chain for a sufficiently long time, 
one can approximate a sample from the desired distribu- 
tion. Unfortunately, such a sample is exact only in the 
limit of infinite time. In practice, it is often difficult to 
determine how long to wait to achieve sufficiently good 
samples, and one inevitably either produces poor samples 
or wastes time by running the Markov chain for longer 
than necessary. 

However, in 1996, Propp and Wilson demonstrated the 
possibility of exact sampling by the method of coupling 
from the past, allowing one to produce perfect samples 
in a finite number of steps [3j. In the most general 
case, their method requires the infeasible task of run- 
ning a Markov chain for every possible initial state of the 
system. But for certain distributions, termed attractive 



(such as a ferromagnetic Ising model), Propp and Wil- 
son showed that the task may be greatly simplified by 
tracking only extremal states, permitting the practical 
calculation of exact samples. This method was gener- 
alized to anti-attractive distributions by Haggstrom and 
Nelander jf| . More recently, Huber showed that one may 
instead track only a single state that summarizes one's 
knowledge of the system |J . Because it does not require 
that the states be partially ordered, this last method is 
applicable to non-attractive distributions. 

Using the summary state method, we have drawn ex- 
act samples from the antiferromagnetic triangular Ising 
model and from the Potts model. Fig. [j] shows one such 
sample. In § [ll| w e describe the methods that make this 
possible. In §QI, we briefly discuss the Ising and Potts 
models. We present results from the exact sampling of 
these models in §[V. Finally, we discuss the convergence 
properties of the summary state method, and we suggest 
practical generalizations. 




FIG. 1. An exact sample from a triangular Ising antifer- 
romagnet with 14,400 spins at (3~ = 4.9 and zero applied 
magnetic field. 
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II. COUPLING FROM THE PAST AND THE 
SUMMARY STATE METHOD 

Propp and Wilson's method of coupling from the 
past || is based on the observation that, for a fixed choice 
of the random numbers used to propagate a Markov 
chain, its possible paths in state space may ultimately 
coalesce into a single trajectory. Once two initial states 
lead to the same state, they will remain in lock step there- 
after. 

Consider simulating a Markov chain from every pos- 
sible initial state at some fixed time t = —T, with the 
goal of taking a sample at t = 0. If all the chains coa- 
lesce before t = 0, then this finite procedure yields the 
same results as a Monte Carlo simulation started at an 
infinite time in the past, so the result is an exact sample. 
If the chains fail to coalesce, one can simply double the 
starting time to — 2T, reusing the random numbers for 
the interval [—t, 0] (i.e., treating the random numbers as 
a function of simulation time), and repeat until coales- 
cence is achieved. 

Having to follow every possible state would make this 
method exponentially intractable. But for problems that 
admit a partial ordering of the states and which are "at- 
tractive" — that is, which preserve the ordering under 
evolution of the Markov chain — the computation can 
be vastly simplified by tracking only the extremal states. 
An example of an attractive system is the ferromagnetic 
Ising model, in which it is energetically favorable for spins 
to align with each other. 

Hubcr H and Harvey and Neal || have shown that 
the method of Propp and Wilson may be generalized us- 
ing a single summary state instead of a pair of extremal 
states. This single state summarizes one's knowledge of 
the possible states of the system, allowing the state of 
some subsystems to be uncertain. 

For example, suppose the system is a collection of vari- 
ables <7j taking on the values {±1}- Conventional Gibbs 
updating sets 



-I if u < P{<Ji = 
-1 if u > P(ai = -1 



(2) 



where u is uniformly distributed on [0, 1] and Oi denotes 
the set of all variables but the ith. To implement sum- 
mary states, we allow each variable to take on the addi- 
tional value ? which indicates uncertainty. We then run 
a modified Markov chain on this system: Oi is updated 
according to Eq. (Q) if the result is the same for any pos- 
sible assignment of ±1 to the ?'s in af, otherwise, cr^ i— > ?. 
As in the Propp and Wilson method, we run the chain 
from successively longer times in the past with random 
numbers as a function of simulation time. When no vari- 
ables remain in ? states, the algorithm has converged, 
and we may take a sample at t = 0. 

For the case of attractive distributions, this procedure 
is exactly equivalent to the Propp and Wilson scheme. 
The value ? denotes variables that differ between the 



maximal and minimal states, and removal of all ? states 
corresponds to coalescence of the bounding chains. How- 
ever, using a single summary state, there is no require- 
ment that the states be ordered in any way. Thus 
the summary state method can also be applied to non- 
attractive distributions — for example, the antiferromag- 
netic Ising model. 

Although the samples returned by this method are ex- 
act, the algorithm does not necessarily converge after a 
reasonable amount of time. Huber has shown that for an- 
tiferromagnetic spin systems at sufficiently high temper- 
ature, the expected running time of the algorithm is poly- 
nomial in the number of spins However, for systems 
with a phase transition, the convergence time diverges as 
a power law at the critical temperature, a phenomenon 
known as critical slowing down Q| . 



III. THE ISING AND POTTS MODELS 

Consider the Hamiltonian 

E(<t) = — - Jmn<Jm<7n — ^ H m (T m , (3) 

m,n m 

where J mn is the coupling between spins m and n and 
H m is the value of an external magnetic field at the lo- 
cation of spin m. The appropriate Markov chain update 
rule is Eq. (0) with 



P(a i = ±l\a i ) = 



-/3E(o 



--±1) 



e -/3E(a i = + l) _|_ e -0E{<Ti=-\) 



(4) 



In the Ising model J mn is taken to be zero un- 
less spins m and n are adjacent, in which case it is some 
constant J. Cases of particular interest are the square 
lattice, in which each spin has four neighbors, and the 
triangular lattice, with six neighbors per spin. In gen- 
eral, the behavior of Ising systems can vary with their 
spin connectivity. For both kinds of lattices, we use pe- 
riodic boundary conditions. Because we may simultane- 
ously update the states of spins whose conditional dis- 
tributions are independent, one iteration of the Markov 
chain consists of two sweeps for the square lattice and 
three for the triangular lattice. 

In this paper, we use the normalization J = ±1. 
,/ = +1 corresponds to the ferromagnetic case, in which 
spins prefer to point in the same direction; J = — 1 corre- 
sponds to the antiferromagnet. As mentioned previously, 
the ferromagnetic case is attractive. The antiferromag- 
net on a square lattice is a special case, because its prop- 
erties are isomorphic to those of a square ferromagnet. 
However, for a triangular lattice, there is no such iso- 
morphism. With six neighbors per spin, there is no way 
to minimize the energy locally at all sites: we say the 
system is frustrated. 

It is well known that a two-dimensional ferromagnetic 
Ising model exhibits a phase transition |J . Below a crit- 
ical temperature /3~ , there is spontaneous symmetry 



2 



breaking, and the system develops a preferred spin orien- 
tation in the absence of any magnetic field. For a square 
lattice, (5~ x — 2.21 . At this temperature, the relaxation 
time of the dynamic system diverges, a phenomenon 
known as critical slowing down 0. Correspondingly, 
there is a divergence in the convergence time for some 
Markov chain Monte Carlo algorithms, such as coupling 
from the past, and exact samples cannot be generated for 
lower temperatures. Note that there is no phase transi- 
tion in the case of a triangular antiferromagnet jio), so 
there cannot be a critical slowing down in the traditional 
sense. 

To circumvent the problem of nonconvergence below 
the critical temperature, Propp and Wilson actually used 
a related system, the random cluster model, to generate 
Ising samples Q. Unfortunately, this model has no ob- 
vious analog in the antiferromagnetic case. 

The Potts model is a generalization of the Ising model 
wherei n sp ins may take on q different values {0, 1, . . . , q— 
1} |PH , |12| . Spins interact only with others of the same 
type. The Hamiltonian is 



E(a) 



S 



m.k 
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Specifically, we consider the antiferromagnetic Potts 
model with q = 3 on a square lattice with zero magnetic 
field. This model has a critical point only at /3 _1 = 0, so 
there is no phase transition ]l3[ ]. 



IV. RESULTS 



A. Ising model 



By implementing the summary state method, we have 
produced exact samples from the Ising and Potts mod- 
els. For example, Fig. |l| shows a sample from a triangular 
Ising antiferromagnet consisting of 120 2 = 14,400 spins 
at = 4.9 with zero applied magnetic field. 

We find that the number of iterations required for the 
algorithm to converge diverges at a threshold tempera- 
ture. We have studied this divergence using a lattice of 
N = 63 2 = 3969 spins. Simulations using larger N (e.g., 
N = 99 2 ) suggest that the outcome is not significantly 
affected by choosing a larger grid size. Fig. || shows the 
divergence, to which we have fitted a power law of the 
form 



t = 



(6) 



We find that the time diverges with an exponent b = 



1.03 ± 0.01 at the threshold temperature /3 t 
0.005. 



4.839 ± 




FIG. 2. Variation with temperature of the number of it- 
erations required for convergence of the summary state algo- 
rithm for the triangular, antiferromagnetic Ising model. Each 
point corresponds to either 500, 1000, or 1500 exact samples, 
with more samples taken at lower temperature. The solid line 
shows the fit to Eq. (^). 

This divergence is an important feature of the sum- 
mary state method. It is qualitatively similar to critical 
slowing down, but note that no physical phase transi- 
tion is involved. In divergent situations the augmented 
Markov chain has a metastable set of distributions with 
many ?'s, such that it is very unlikely for it to enter a 
state with no ?'s. 

To draw an exact sample using the summary state 
method, the system must go from a completely uncer- 
tain state to a completely certain state. Thus, it must 
pass through a state with only a few scattered ?'s. For 
temperatures sufficientlty near the threshold, where we 
know that such a sparse configuration can be reached, we 
might expect that the limiting factor is the probability 
that an isolated ? can cause divergence. 

Therefore, as a very rough estimate, we might suppose 
that the divergence occurs when the probability of a sin- 
gle ? turning one of its six neighbors into a ? rises above 
g. We expect that the neighbors of any given spin Oi 
should be (on average) half up and half down. Replacing 
one of these neighbors by a ?, we may assume the con- 
figuration (TTTII ?) without loss of generality. Then the 
threshold temperature is determined by 
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which has the solution /3 _1 = 4/ In 2 w 5.8. 

To examine the validity of a threshold temperature 
analysis based on the persistence of single ?'s, we com- 
piled statistics on the stability of an equilibrium system 
with a single ? added. Because we cannot create exact 
samples for much of the temperature range of interest, we 
generated approximate samples by simulating for fixed 
time (100 iterations) a random initial state. We then set 
one spin to ? and simulated the system forward. If any 
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uncertainty remained after 500 iterations, we said the 
system diverged. Fig. || shows the fraction of divergent 
trials for various temperatures. As one would expect, 
this fraction goes to zero very near the threshold tem- 
perature. 



B. Potts model 



FIG. 3. Fraction of equilibrium systems that diverged after 
a single ? was added. The data are from 9000 trials at each 
temperature. 

It is also interesting to consider how the algorithm be- 
haves when a uniform nonzero magnetic field H is ap- 
plied. Biasing the spins makes it easier for them to choose 
a particular orientation, so we would expect convergence 
to be easier. Fig. || shows the region of convergence in 
the (/3"\-ff) plane. 



In addition, we have implemented exact sampling of 
the Potts model for arbitrary q. Fig. || shows an exact 
sample with q — 3 for a square antiferromagnetic lattice 
of 100 2 = 10, 000 spins. 




FIG. 5. An exact sample from the q — 3 antiferromagnetic 
Potts model for a 10,000 spin square lattice at = 1.2. 
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FIG. 4. Region of convergence for the summary state algo- 
rithm on the triangular antiferromagnetic Ising model. Points 
in region (1) allow convergence, whereas points in region (2) 
are inaccessible to the algorithm. Each data point corre- 
sponds to 45 searches for the threshold, each using a different 
set of random numbers. 



A naive implementation of the summary state method 
would augment the possible spin values with a single 
?. We refer to this method as algorithm A. However, 
it is possible to retain more information about uncer- 
tain spins: for each spin, we store a binary g-bit vector 
(bx, 62, • • • , b q ), hi e {0,1}. Bit hi is set to one if it is 
possible for the spin to take on the value i: thus the ini- 
tial state of each spin is b = (1, 1, . . . , 1). In updating the 
state of the system, we set hi = only when the spin can- 
not take on the value i for any allowed configuration of 
its neighbors. We refer to the latter method as algorithm 
B. 

To demonstrate the advantage of retaining more in- 
formation in the summary state, we have studied the 
convergence properties of both algorithms. This com- 
parison is shown in Fig. ^, based on data for a square 
64 2 = 4096 spin lattice. As in the Ising study, both 
algorithms lead to a power law divergence with an ex- 
ponent of one (b A = 1-04 ± 0.03, b B = 0.99 ± 0.02). 
However, the threshold temperatures for the two algo- 
rithms are quite different: ff^\ = 2.293 ± 0.005, whereas 
07g = 1.157 ±0.004. As in the Ising example above, nei- 
ther of the divergences corresponds to a physical phase 
transition. 
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FIG. 6. Temperature dependence of the convergence time 
for the square, antiferromagnetic q — 3 Potts model under 
algorithms A and B. Each point corresponds to 1600 exact 
samples on a 4096 spin lattice. The solid lines give fits to 
Eq. @). 



Of course, summary state sampling remains a valu- 
able tool even if it cannot be done below some threshold. 
Where it does work, it is exact. The method also has the 
advantage that we need not even know if the distribu- 
tion in question is attractive — it can be applied to any 
system. 
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V. CONCLUSIONS 

We have demonstrated the usefulness of the summary 
state method for exact sampling from non-attractive dis- 
tributions. In both the antiferromagnetic Ising and Potts 
models, the method works above a certain threshold tem- 
perature, with a power law divergence in the coalescence 
time at the threshold. Although similar to the phe- 
nomenon of critical slowing down, this divergence does 
not occur at a physical phase transition. As the Potts 
example shows, the location of the divergence is a feature 
of the specific implementation of the summary states, not 
of the underlying distribution. We have shown that re- 
taining more information in the summary state will allow 
convergence at lower temperatures. 

Based on this result, we may propose an improved al- 
gorithm for the triangular antiferromagnetic Ising model. 
At lower temperatures, the system should be increasingly 
ordered, and tracking this order might make it easier to 
gain incremental knowledge of the state of the system. 
One idea is to keep track of correlations between spins 
by grouping them into hexagonal clumps of seven, which 
can be used to tile the triangular lattice. Each tile has 
2 7 = 128 possible states. In analogy to the Potts method 
presented earlier (in which a g-bit vector represents the 
uncertainty about a spin), representing each tile with a 
128-bit vector would allow individually tracking the pos- 
sible arrangements of those seven spins. Within a tile, 
the summary state can track anticorrelation, which we 
expect to arise at low temperatures. Each edge of a tile 
can be easily summarized in a 4-bit vector for compari- 
son with its neighboring tiles. Each tile can then update 
its summary state by considering the possible states of 
the neighboring edges. 
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